Search Results: "Julien Danjou"

22 January 2013

Julien Danjou: Extending Swift with middleware: example with ClamAV

In this article, I'm going to explain you how you can extend Swift, the OpenStack Object Storage project, so it performs extra action on files at upload or at download time. We're going to build an anti-virus filter inside Swift. The goal is to refuse uploaded data if they contain a virus. To help us with virus analyses, we'll use ClamAV. WSGI, paste and middleware

In a pipeline, a black cat can become a white cat with the help of some middleware.

To do our content analysis, the best place to hook in the Swift architecture is at the beginning of every request, on swift-proxy, before the file is actually stored on the cluster. Swift proxy uses, like many other OpenStack projects, paste to build his HTTP architecture. Paste uses WSGI and provides an architecture based on a pipeline. The pipeline is composed of a succession of middleware, ending with one application. Each middleware has the chance to look at the request or at the response, can modify it, and then pass it to the following middleware. The latest component of the pipeline is the real application, and in this case, the Swift proxy server. If you've already deployed Swift, you encountered a default pipeline in the swift-proxy.conf configuration file:

[pipeline:main]
pipeline = catch_errors healthcheck cache ratelimit tempauth proxy-logging proxy-server

This is a really basic pipeline with a few middleware. The first one catches error, the second one is in charge to return 200 OK response if you send a GET /healthcheck request on your proxy server. The third one is in charge of caching, the fourth one is used for rate limiting, the fifth for authentication, the sixth one for logging, and the final one is the actual proxy server, in charge of proxying the request to the account, container, or object servers (the others components of Swift). Of course, we could remove or add any of the middleware here at our convenience. Be aware that the order matters: for example, if you put healthcheck after tempauth, you won't be able to access the /healthcheck URL without being authenticated! ClamAV

If you don't know ClamAV, it's an antivirus engine designed for detecting trojans, viruses, malware and other malicious threats. Wwe're going to use it to scan every incoming file. To build the middleware, we'll use the Python binding pyclamd. The API is quite simple, see:

>>> import pyclamd
>>> pyclamd.init_unix_socket('/var/run/clamav/clamd.ctl')
>>> print pyclamd.scan_stream(pyclamd.EICAR)
 'stream': 'Eicar-Test-Signature(44d88612fea8a8f36de82e1278abb02f:68)' 
>>> print pyclamd.scan_stream("safe!")
None

Anatomy of a WSGI middleware Your WSGI middleware should consist of a callable object. Usually this is done with a class implementing the __call__ method. Here's a basic boilerplate:

class SwiftClamavMiddleware(object):
    """Middleware doing virus scan for Swift."""
 
    def __init__(self, app, conf):
        # app is the final application
        self.app = app
 
    def __call__(self, env, start_response):
        return self.app(env, start_response)
 

def filter_factory(global_conf, **local_conf):
    conf = global_conf.copy()
    conf.update(local_conf)
 
    def clamav_filter(app):
        return SwiftClamavMiddleware(app, conf)
    return clamav_filter

I'm not going to expand more on why this is built this way, but if you want to have more info on this kind of filter middleware, you can read their documentation on Paste. This middleware will just do nothing as it is. It's going to simply pass all requests it receives to the final application, and returns the result. Testing our basic middleware

Now is a really good time to add unit tests. I hope you didn't think we were going to write code without some tests, right? It's really easy to test a middleware, as we're going to use WebOb for that.

import unittest
from webob import Request
 

class FakeApp(object):
    def __call__(self, env, start_response):
        return Response(body="FAKE APP")(env, start_response)
 

class TestSwiftClamavMiddleware(unittest.TestCase):
 
    def setUp(self):
        self.app = SwiftClamavMiddleware(FakeApp(),  )
 
    def test_simple_request(self):
        resp = Request.blank('/',
                             environ= 
                                 'REQUEST_METHOD': 'GET',
                              ).get_response(self.app)
        self.assertEqual(resp.body, "FAKE APP")

We create a FakeApp class, that represents a fake WSGI application. You could also use a real application, or write a fake application looking like the one you want to test. It'll require more time, but your tests will be closer to the reality. Here we write the simplest test we can for our middleware. We're just sending a GET / request to it, so it passes the request to the final application and returns the result. It is transparent, it does nothing. Now, with that solid base we'll able to add more features and test these features incrementally. Plugging ClamAV in With our base ready, we can start thinking about how to plug ClamAV in. What we want to check here, is the content of the file when it's uploaded. If we refer to the OpenStack object storage API , a file upload is done via a PUT request, so we're going to limit the check to that kind of requests. Obviously, more checks could be added, but we'll keep things simple here for the sake of comprehensibility. With WSGI, the content of the request is available in env['wsgi.input'] as an object implementing a file interface. We'll scan that stream with ClamAV to check for viruses.

import pyclamd
from webob import Response
 

class SwiftClamavMiddleware(object):
    """Middleware doing virus scan for Swift."""
 
    def __init__(self, app, conf):
        pyclamd.init_unix_socket('/var/run/clamav/clamd.ctl')
        # app is the final application
        self.app = app
 
    def __call__(self, env, start_response):
        if env['REQUEST_METHOD'] == "PUT":
            # We have to read the whole content in memory because pyclamd
            # forces us to, but this is a bad idea if the file is huge.
            scan = pyclamd.scan_stream(env['wsgi.input'].read())
            if scan:
                return Response(status=403,
                                body="Virus %s detected" % scan['stream'],
                                content_type="text/plain")(env, start_response)
        return self.app(env, start_response)
 

def filter_factory(global_conf, **local_conf):
    conf = global_conf.copy()
    conf.update(local_conf)
 
    def clamav_filter(app):
        return SwiftClamavMiddleware(app, conf)
    return clamav_filter

That's it. We only check for PUT requests and if there's a virus in the file, we return a 403 Forbidden error with the name of the detected virus, bypassing entirely the rest of the middleware chain and the application handling. Then, we can simply test it.

import unittest
from cStringIO import StringIO
from webob import Request, Response
 

class FakeApp(object):
    def __call__(self, env, start_response):
        return Response(body="FAKE APP")(env, start_response)
 

class TestSwiftClamavMiddleware(unittest.TestCase):
    def setUp(self):
        self.app = SwiftClamavMiddleware(FakeApp(),  )
 
    def test_put_empty(self):
        resp = Request.blank('/v1/account/container/object',
                             environ= 
                                 'REQUEST_METHOD': 'PUT',
                              ).get_response(self.app)
        self.assertEqual(resp.body, "FAKE APP")
 
    def test_put_no_virus(self):
        resp = Request.blank('/v1/account/container/object',
                             environ= 
                                 'REQUEST_METHOD': 'PUT',
                                 'wsgi.input': StringIO('foobar')
                              ).get_response(self.app)
        self.assertEqual(resp.body, "FAKE APP")
 
    def test_put_virus(self):
        resp = Request.blank('/v1/account/container/object',
                             environ= 
                                 'REQUEST_METHOD': 'PUT',
                                 'wsgi.input': StringIO(pyclamd.EICAR)
                              ).get_response(self.app)
        self.assertEqual(resp.status_code, 403)

The first test test_put_empty simulates an empty PUT request. The second one, test_put_no_virus simulates a regular PUT request but with a simple file containing no virus. Finally, the third and last test simulates the upload of a virus using the EICAR test file. This is a special test file that is recognized as a virus, even if it's not real one. Very handy for testing virus detection software! Configuring Swift proxy Our middleware is ready! We can configure Swift's proxy server to use it. We need to add the following lines to our swift-proxy.conf to teach it how to load the filter:

[filter:clamav]
paste.filter_factory = swiftclamav:clamav_filter

We'll assume that our Python modules is named swiftclamava here. Now that we've defined our filter and how to load it, we can use it in our pipeline:

[pipeline:main]
pipeline = catch_errors healthcheck cache ratelimit tempauth clamav proxy-logging proxy-server

Just before reaching the proxy-server, and after the user being authenticated, the content will be scanned for viruses. It's important here to put this after authentication for example, because otherwise we may scan content that will get rejected by the authtemp module, thus scanning for nothing! Beyond scanning And voil , we now have a simple middleware testing uploaded content and refusing infected files. We could enhance it with various other things, like configuration handling, but I'll let that as an exercise for the interested readers. We didn't exploited it here, but note that you can also manipulate request headers and modify them if needed. For example, we could have added a header X-Object-Meta-Scanned-By: ClamAV to indicates that the file has been scanned by ClamAV. You should now be able to build your own middleware doing whatever you want with uploaded data. Happy hacking!

11 January 2013

Julien Danjou: Overriding cl-json object encoding

CL-JSON provides an encoder for Lisp data structures and objects to JSON format. Unfortunately, in some case, its default encoding mechanism for CLOS objects isn't exactly doing the right thing. I'll show you how Common Lisp makes it easy to change that. Identifying the problem CL-JSON & CLOS CL-JSON mechanism encoding CLOS object is really neat. Let's see how it works for a simple case:

(defclass kitten ()
  ((tail :initarg :tail)))
 
(json:encode-json-to-string (make-instance 'kitten :tail 'black))

will produce:

 "tail":"black"

Still using CL-JSON, we can also decode the JSON object to a CLOS object:

(slot-value
 (json:with-decoder-simple-clos-semantics
   (json:decode-json-from-string " \"tail\":\"black\" "))
 :tail)

That code will return "black". Note that it's also possible to specify which class should be used when decoding objects, but that's beyond the purpose of this article. Postmodern Now, let's introduce Postmodern, a wonderful Common Lisp system providing access to the wonderful PostgreSQL database. It also provides a simple system to map rows in a database to CLOS classes, called DAO for Database access objects. With this, we can easily store our kitten into a table.

(defclass kitten ()
  ((tail :initarg :tail))
  (:metaclass postmodern:dao-class))

If we try to encode this to JSON, it will produce the exact same result seen previously. The problem is what happens when one of our column has a NULL value. Postmodern encodes this using the :null symbol. So this code:

(defclass kitten ()
  ((tail :initarg :tail :col-type (or s-sql:db-null text)))
  (:metaclass postmodern:dao-class))
 
(postmodern:deftable kitten
  (postmodern:!dao-def))
 
(postmodern:connect-toplevel  )
 
(postmodern:create-table 'kitten)
 
(json:encode-json-to-string
  (postmodern:make-dao 'kitten))

will return:

" "tail":"null" "

Fail! The fact that the column is NULL is represented by the :null symbol. And CL-JSON encodes all symbols as string. This is not at all what we want here! Overriding encode-json CL-JSON provides and uses the encode-json method to encode all kind of object. It is defined as a generic function, and a lot of different methods are implemented to handle the different standard Common Lisp types. The one used for standard-object is defined liked that:

(defmethod encode-json ((o standard-object)
                        &optional (stream *json-output*))
  "Write the JSON representation (Object) of the CLOS object O to
STREAM (or to *JSON-OUTPUT*)."
  (with-object (stream)
    (map-slots (stream-object-member-encoder stream) o)))

All we need to do here, is to create a new method for our kitten objects, that handles correctly the :null case.

(defclass kitten ()
  ((tail :initarg :tail :col-type (or s-sql:db-null text)))
  (:metaclass postmodern:dao-class))
 
(export 'kitten)
 
;; Switch package just to define the new method
(in-package :json)
(defmethod encode-json ((o cl-user:kitten)
                        &optional (stream json:*json-output*))
  "Write the JSON representation (Object) of the postmodern DAO CLOS object
O to STREAM (or to *JSON-OUTPUT*)."
  (with-object (stream)
    (map-slots (lambda (key value)
                 (as-object-member (key stream)
                   (encode-json (if (eq value :null) nil value) stream)))
               o)))
 
;; Go back into our package
(in-package :cl-user)
 
(postmodern:deftable kitten
  (postmodern:!dao-def))
 
(postmodern:connect-toplevel  )
 
(postmodern:create-table 'kitten)
 
(json:encode-json-to-string
  (postmodern:make-dao 'kitten))

With that new method, as soon as we encounter a :null symbol as a value for an object's slot, we replace it by nil. Now if we try to encode another kitten, we'll get:

 "tail":null

which is far better for our JavaScript data consumers! In the end, I think that this kind of trick is feasible that easily because of the way CLOS provides its generic method implementation. The fact that methods don't belong to any class makes the extension of every program, library and class so much easier. Doing this in another language like Java would likely by impossible, and in Python it would unlikely be as clean as it is done in Common Lisp. The ability to teach any library about how it should handle your class just by defining a new method is really handy!

4 January 2013

Julien Danjou: Integrating cl-irc and cl-async

Recently, I've started programming in Common Lisp. My idea here is to use cl-irc, an IRC library into an event loop. This can be really useful, for example to trigger action based on time, using timers. Creating a connection The first step is to create a basic cl-irc:connection object on our own. This can be achieved easily with this:

(require :cl-irc)
 
(defun connect (server)
  (cl-irc:make-connection :connection-type 'cl-irc:connection
                                              :client-stream t
                                              :network-stream ?
                                              :server-name server))

This will return a cl-irc:connection object, logging to stdout (:client-stream t) and having the server name server. Note that the server name could be any string. You probably noticed the ? I used as :network-stream value. This is not a real and working value: this should be a stream established to the IRC server you want to chat with. This is where we'll need to use cl-async:tcp-connect to establish a TCP connection. As you can read in this function's documentation, all we need to pass is the server address, two callbacks for read and general events, and the :stream option to get a stream rather than a socket. So you would do something like:

(require :cl-irc)
(require :cl-async)
 
(defun connection-socket-read (socket stream)
  (format t "We should read the IRC message from ~a ~%" stream))
 
(defun connection-socket-event (ev)
  (format t "Socket event: ~a~%" ev))
 
(defun connect (server &optional (port 6667))
  (cl-irc:make-connection :connection-type 'cl-irc:connection
                          :client-stream t
                          :network-stream (as:tcp-connect server port
                                                          #'connection-socket-read
                                                          #'connection-socket-event
                                                          :stream t)
                          :server-name server))
 
(as:start-event-loop (lambda () (connect "irc.oftc.net")))

If you run this program, it will connect to the OFTC IRC server, and then notice you each time the server is sending you a message. Therefore our problem here is how we you treat the message read from the stream in connection-socket-read and handle them in the name of our connection object you used? We can't link both together at this point. We can't build a closure, because as the time we use as:tcp-connect we don't have the cl-irc:connection instance. Also we can't change easily the read-cb parameter of our network-stream established by as:tcp-connect, simply because cl-async doesn't use to do allow that. Building a closure So one solution here is to hack cl-irc:make-connection so we can build an cl-irc:connection instance without providing in advance the network-stream, allowing us to build a closure including the cl-irc:connection to read event for. This is what we're going to do in the connect function.

(require :cl-irc)
(require :cl-async)
(require :flexi-streams)
 
(defun connection-socket-read (connection)
  (loop for message = (cl-irc::read-irc-message connection)
        while message
        do (cl-irc:irc-message-event connection message)))
 
(defun connection-socket-event (ev)
  (format t "Socket event: ~a~%" ev))
 
(defun connect (server port nickname
                &key
                  (username nil)
                  (realname nil)
                  (password nil))
  ;; Build an instance of cl-irc:connection, without any network/output stream
  (let* ((connection (make-instance 'cl-irc:connection
                                    :user username
                                    :password password
                                    :server-name server
                                    :server-port port
                                    :client-stream t))
         ;; Use as:tcp-connect to build our network stream, and build a
         ;; closure calling  connection-socket-read' with our  connection'
         ;; as arguments
         (network-stream (as:tcp-connect server port
                                         (lambda (socket stream)
                                           (declare (ignore socket stream))
                                           (connection-socket-read connection))
                                         #'connection-socket-event
                                         :stream t)))
    ;; Set the network stream on the connection
    (setf (cl-irc:network-stream connection) network-stream)
    ;; Set the output stream on the connection
    (setf (cl-irc:output-stream connection)
         ;; This is grabbed from cl-irc:make-connection
          (flexi-streams:make-flexi-stream
           network-stream
           :element-type 'character
           :external-format '(:utf8 :eol-style :crlf)))
 
    ;; Now handle the IRC protocol authentication pass
    (unless (null password)
      (cl-irc:pass connection password))
    (cl-irc:nick connection nickname)
    (cl-irc:user- connection (or username nickname) 0 (or realname nickname))
    connection))
 
(as:start-event-loop (lambda () (connect "irc.oftc.net" 6667 "jd-blog")))

And here we are! If we run this, we're now using an event loop to run cl-irc. Each time the socket has something to read, the function connection-socket-read will be called on the non-blocking mode socket. If there's no message to be read, then the function will exit and the loop will continue to run. Using timers You can now modify the last line with this:

(defun say-hello (connection)
  (cl-irc:privmsg connection "#jd-blog" "Hey I read your blog!")
  (as:delay (lambda () (say-hello connection)) :time 60))
 
(as:start-event-loop (lambda ()
                       (let ((connection (connect "irc.oftc.net" 6667 "jd-blog")))
                         (cl-irc:join connection "#jd-blog")
                         (say-hello connection))))

This will connect to the IRC server, join a channel and then say the same sentence every minute. Challenge accomplished! And I'd like to thank Andrew Lyon, the author of cl-async, who has been incredibly helpful with my recent experimentations in this area.

24 December 2012

Julien Danjou: Ceilometer bug squash day #1

In order to start the year in a good mood, what's the best than squashing some bugs on OpenStack? Therefore, the Ceilometer team is pleased to announce that it organizes a bug squashing day on the Friday 4th January 2013. We wrote an extensive page about how you can contribute to Ceilometer, from updating the documentation, to fixing bugs. There's a lot you can do. We've good support for Ceilometer built into Devstack, so installing a development platform is really easy. The main goal on this bug day will be put Ceilometer in the best possible shape before the grizzly-2 milestone arrives (10th January 2013). This version of Ceilometer will aim to keep compatibility with Folsom, so early deployers can enjoy some of our new features before upgrading to Grizzly. After that date, we'll start merging more extensive changes. We'll be hanging out on the #openstack-metering IRC channel on Freenode, as usual, so feel free to come by and join us!

16 November 2012

Julien Danjou: Logitech Unifying devices support in UPower

A few months ago, I wrote about my reverse engineering attempt to Logitech Unifying devices. Back then, I concluded my post with big hopes on the future after receiving a document with some part of the specification of the HID++ 2.0 from Logitech. A couple of weeks ago, some of my summer work has been merged to UPower, adding battery support for some Logitech devices. HID++ UPower

As I discovered late in my first reverse engineering attempt, Logitech developed a custom HID protocol named HID++. This protocol exists in two versions, 1.0 and 2.0. Some devices talk with version 1 of the protocol (like my M705 mouse) and some others talk with version 2 of the protocol (like my K750 keyboard). Recently, I've been able to be in touch with a Logitech engineer who worked on the Linux support for the Unifying receiver, and he has been really helpful and exposed me some details about this protocol. Logitech made the decision to publish their HID++ specification publicly about a year ago, but still didn't do it. The internal review needed to publish such documents hasn't be done yet. The only published draft is just an extract of the specification, with even some typo in it as I discovered. Some other documents have been recently published, but I didn't have the time to review them. They contains HID++ 1.0 specifications and some details I asked for about the K750 keyboard. UPower support UPower

It took me sometime to get a full understanding of the protocol, its different version etc. After reverse engineering my K750 keyboard, I've also reverse engineered the data stream used to get my M705 mouse battery status. I've also received some information about the HID++ 1.0 protocol, so I've been able to discover a bit more on what the packets mean. Most of my discoveries are now used to do proper #define in up-lg-unifying.c so the code makes more sense. My first patch implements a new property for UPower devices, named luminosity, that use with K750 keyboard to report the light level received. The second patch add support for Logitech Unifying devices (over USB only) and should work with at least Logitech M705 and K750 devices. This should be available with the next version of UPower, which should be 0.9.19. gnome-power-statistics for K750

So far, Logitech has been kind enough to help me understanding part of the protocol and even sent me a few devices so I can play and test my work with them. Unfortunately, this will probably requires some work and time, and so far Logitech was not able to help with that. There should be enough information out there to at least add support for battery to HID++ 2.0 devices, and probably a few other things too. I hope I'd get the time do this at some point, but feel free to beat me in this race!

6 November 2012

Raphaël Hertzog: My Free Software Activities in October 2012

This is my monthly summary of my free software related activities. If you re among the people who made a donation to support my work (120.46 , thanks everybody!), then you can learn how I spent your money. Otherwise it s just an interesting status update on my various projects. Dpkg At the start of the month, I reconfigured dpkg s git repository to use KGB instead of the discontinued CIA to send out commit notices to IRC (on #debian-dpkg on OFTC, aka irc.debian.org). I didn t do anything else that affects dpkg and I must say that Guillem does not make it easy for others to get involved. He keeps all his work hidden in his private for 1.17.x branch and refuses to open an official jessie branch as can be seen from the lack of answer to this mail. On the bright side, he deals with almost all incoming bugs even before I have a chance to take care of them. But it s a pity that I can never review any of his fixes because they are usually pushed shortly before an upload. Misc packaging I helped to get #689336 fixed so that the initrd properly setups the keymap before asking for a passphrase for an encrypted partition. Related to this I filed #689722 so that cryptsetup gains a dependency ensuring that the required tools for keymap setup are available. I packaged a new upstream version of zim (0.57) and also a security update for python-django that affected both Squeeze and Wheezy. I uploaded an NMU of revelation (0.4.13-1.2) so that it doesn t get dropped from Wheezy (it was on the release team list of leaf packages that would be removed if unfixed) since my wife is using it to store her passwords. I sponsored a new upstream version of ledgersmb. Debian France We managed to elect new officers for Debian France. I m taking over the role of president, Sylveste Ledru is the new treasurer and Julien Danjou is the new secretary. Thank you very much to the former officers: Carl Chenet, Aur lien Jarno and Julien Cristau. We re in the process of managing this transition which will be completed during the next mini-Debconf in Paris so that we can exchange some papers and the like. In the first tasks that I have set myself, there s recruiting two new members for the boards of directors since we re only 7 and there are 9 seats. I made a call for volunteers and we have two volunteers. If you want to get involved and help Debian France, please candidate by answering that message as soon as possible. The Debian Handbook I merged the translations contributed on debian.weblate.org (which led me to file this wishlist bug on Weblate itself) and I fixed a number of small issues that had been reported. I made an upload to Debian to incorporate all those fixes But this is still the book covering Squeeze so I started to plan the work to update it for Wheezy and with Roland we have decided who is going to take care of updating each chapter. Librement Progress is annoyingly slow on this project. Handling money for others is highly regulated, at least in the EU apparently. I only wanted an escrow account to secure the money of users of the service but opening this account requires either to be certified as a payment institution by the Autorit de contr le prudentiel or to get an exemption from the same authority (covering only some special cases) or to sign a partnership with an established payment institution. Being certified is out of scope for now since it requires a minimum of 125000 EUR in capital (which I don t have). My bank can t sign the kind of partnership that I would need. So I have to investigate whether I can make it fit in the limited cases of exemption or I need to find another payment institution that is willing to work with me. Gittip uses Balanced a payment service specialized in market places but unfortunately it s US-only if you want to withdraw money from the system. I would love a similar service in Europe If I can t position Librement as a market place for the free software world (and save each contributor the hassle to open a merchant account), then I shall fallback to the solution where Librement only provides the infrastructure but no account, and developers who want to collect donations will have to use either Paypal or any other supported merchant account to collect funds. That s why my latest spec updates concerning the donation service and the payment service mentions Paypal and the possibility of choosing your payment service for your donation form. Thanks See you next month for a new summary of my activities.

5 comments Liked this article? Click here. My blog is Flattr-enabled.

Julien Danjou: OpenStack France meetup #2

I was at the OpenStack France meetup 2 yesterday evening. This has been a wonderful evening, talking about OpenStack and all with around 30-40 people. I and Nick Barcet presented Ceilometer and have received some good feedbacks about it. We should also thanks Nebula, who sponsored the evening, and Erwan Gallen since it was nicely organized, and free beers are always enjoyable. For people interested, the slides of our Ceilometer presentations are available. This is a lighter and fresher version of the slides used by Nick and Doug at the OpenStack Design Summit.

22 October 2012

Julien Danjou: Inside Synaps, a CloudWatch-like implementation for OpenStack

A few days ago, Samsung released the source code of Synaps, an implementation of the Amazon Web Service CloudWatch API for OpenStack. Being a developer on the Ceilometer project, I've been curious to look on this project and how it could overlap with Ceilometer or other projects like Heat. What is CloudWatch? CloudWatch is a monitoring system provided by Amazon on its Web Services platform to monitor services. This allows you get notifications and trigger an action on certain threshold. For example, this can be used to scale your architecture by monitoring the number of requests you get on it and its general load by starting new servers.

Synaps Synaps is written in around 7k lines of Python (with 28 % of which are comments), reuses at least one common module of OpenStack (openstack.common.cfg) and copy some modules from Nova. One thing that strikes me, is that there seems to be only a few unit tests compared to most OpenStack projects. Also, many parts of the code and documentation contains text written in korean, which won't be very helpful for most people! :-) It uses some external technologies, like Storm, Cassandra to store its persistent data and Pandas to do data analysis. The API server provides an EC2 compatible API only: no OpenStack specific API. This is probably not a bad thing for now, since I am not aware of any work in this direction. The API access directly the Cassandra back-end for read operation, but relies on RPC to do writes. This way, a set of daemon handles the write using the Storm part of Synaps and do data aggregation. The authentication only supports LDAP, but it should still be possible to add a driver for Keystone. A Java and a Python SDK are provided to record metrics into Synaps, but there's not enough documentation for it to be useful.

Overlap with Heat For now, there's not a lot of overlap with Heat, because Heat does not implement completely the CloudWatch API. Heat actually still misses a lot of the CloudWatch functions. But as soon as it will implement the CloudWatch API completely, the overlap will be complete with Synaps in this regard. One divergence point however, is that Heat uses RPC to access data from the storage back-end via its engine (the central daemon), whereas Synaps directly connects to Cassandra. Also, Heat relies on SQLAlchemy, like most OpenStack projects needing a database. Overlap with Ceilometer One of the goal of Ceilometer is to provide data probes and pollsters for all OpenStack components (Nova, Swift, Quantum ) whereas Synaps let the OpenStack users to put any kind of metric inside it, and therefore doesn't provide anything for now. But the storage of metrics is the main common point between Synaps and Ceilometer. Synaps chose only one technology, Cassandra, to store its metrics, whereas Ceilometer took care of building an abstraction layer for the storage engine. Ceilometer currently allows an operator to use SQL or MongoDB, but Cassandra could likely be added. Data metric consolidation is done by Synaps. This makes sense, since Synaps don't need to have the full data history to trigger alarms. On the opposite, Ceilometer needs to have a full history to allow things like billing, and don't do any aggregation on data. Also, in Synaps, the data analysis is done using Pandas. This means the data used are retrieved from the Cassandra back-end, and then transformed by Pandas inside Synaps in something else. It's likely that in such a case, Synaps should use CQL to achieve that. Ceilometer manipulates the data near their storage: it means that the computation are done by back-end to be efficient (SQL, mapreduce ). Conclusion Considering Samsung open-sourced Synaps late in the development process, I don't feel like they aimed to have it becoming a core component. This is always sad, because the effort put into this implementation are big and it would have probably little to add some abstraction layers to follow what other OpenStack projects do. But this takes time and energy, and it's understandable that Samsung didn't want to achieve this in a short time frame. There's a part of the code and architecture that overlaps with Ceilometer and Heat. Ceilometer is becoming a specialized point to store data metrics from any source: so it's sad, but understandable, that Synaps did not tried to reuse it. Fortunately, Heat is working with Ceilometer to achieve exactly that. This means OpenStack would have only one metrics storage point, used for billing, for monitoring and alarming. Therefore, I think Synaps is an implementation of CloudWatch that should be looked at as an inspiration for Heat and Ceilometer to build a better and more integrated solution!

12 October 2012

Julien Danjou: Ceilometer 0.1 released

After 6 months of development, we are proud to release the first release of Ceilometer, the OpenStack Metering project. Ceilometer. This is a first and amazing milestone for us: we follow all other projects by releasing a version for Folsom! Using Ceilometer, you should now be able to meter your OpenStack cloud and retrieve its usage to build statistics or bill your customer! You can read our announcement on the OpenStack mailing list. Architecture We spent a good amount of time defining and refining our architecture.

One of its important point, is that it has been designed to work without modifying any of the existing core components. Patching OpenStack components in an intrusive way to meter them was not an option for now, simply because we had no legitimacy to do so. This may change in the future, and this will likely be discussed next week during the OpenStack Summit. Meters Initially, we defined a bunch of meters we'd like to have for a first release, and in the end, most of them are available. Some of them are still missing, like OpenStack Object Storage (Swift) ones, mainly due to lack of interest from the involved parties so far. Anyhow, with this first release, you should be able to meter your instances, their network usage, memory, CPU. Images, networks and volumes and their CRUD operations are metered too. For more detail, you can read the complete list of implemented meters. REST API The HTTP REST API has been partially implemented. The provided methods should allow basic integration with a billing system. DreamHost is using Ceilometer in their deployment architecture and coupling it with their billing system! Towards Grizzly We don't have a clear and established road-map for Grizzly yet. We already have a couple of patches waiting in the queue to be merged, like the use of Keystone to authenticate API request and the removal of Nova DB access. On my side, these last days I've been working on a small debug user interface for the API. Ceilometer API server will return this interface if your do an API request from a browser (i.e. requesting text/html instead of application/json).

I hope this will help to discover Ceilometer API more easily for new comers and leverage it to build powerful tools! Anyhow, we have tons of idea and work to do, and I'm sure the upcoming weeks will be very interesting. Also, we hope to be able to become an OpenStack incubated project soon. So stay tuned!

5 September 2012

Raphaël Hertzog: My Debian Activities in August 2012

This is my monthly summary of my Debian related activities. If you re among the people who made a donation to support my work (88.41 , thanks everybody!), then you can learn how I spent your money. Otherwise it s just an interesting status update on my various projects. This month has again been a short one since I have mostly been in vacation during the last 2 weeks. Dpkg Things are relatively quiet during the freeze. I only took care of fixing 3 bugs: a regression of 3.0 (quilt) (#683547), a segfault of dpkg-query -W -f (commit) and a bad auto-completion for French users (#685863). Testing the upgrade to wheezy We got several reports of wheezy upgrade that failed because dpkg ran the trigger while the dependencies of the package with pending triggers are not satisfied. Unfortunately fixing this in dpkg is not without problems (see #671711 for details) so Guillem decided to defer this fix for Jessie. My suggestion of an intermediary solution has fallen in limbo. Instead we now have to find solutions for each case where this can fail (example of failure: 680626). Another way to avoid those errors is to ensure that triggers are run as late as possible. We can improve this in multiple ways. The first way is to modify most triggers so that they use the interest-noawait directive. In that case, the packages activating the trigger will be immediately marked as configured (instead of triggers-awaited ) and the trigger will thus not need to be run as part of further dependency solving logic. But as of today, there s no package using this new feature yet despite a nudge on debian-devel-announce.

The second way is to modify APT to use dpkg no-triggers, and to let the trigger processing for the end (with a last dpkg configure -a call). I requested this early in the wheezy timeframe but for various reasons, the APT maintainers did not act on it. I pinged them again in #626599 but it s now too late for wheezy. I find this a bit sad because I have been using those options for the entire wheezy cycle and it worked fine for me (and I used them for a dist-upgrade on my wife s laptop too). It would have been good to have all this in place for wheezy so that we don t have to suffer from the same problems during the jessie upgrade, but unless someone steps up to steer those changes, it seems unlikely to happen. Instead, we re back to finding klumsy work-arounds in individual packages. Packaging I prepared security updates for python-django (1.4.1 for unstable,
1.2.3-3+squeeze3 for stable). I packaged a new upstream version for cpputest (3.2-1). I reviewed ledgersmb 1.3.21-1 prepared by Robert James Clay and asked him to prepare another version with further fixes. I released nautilus-dropbox 1.4.0-2 with supplementary changes of my own to support https_proxy and to display better diagnostic information when the download fails. With the help of Paul van der Vlis and Michael Ziegler, we did what was required to be able to migrate python-django-registration 0.8 to Wheezy even though it s a new upstream version with backwards incompatible changes. Thanks to Adam D. Barratt who unblocked the package, we now have the right version in Wheezy despite the fact that I missed the freeze deadline. Debian France Julien Cristau reminded the board of Debian France that we have to elect officers (President, Secretary, Treasurer) as the current officers have withdrawn. I was somewhat afraid that nobody would take over so I pinged each member to try to get new volunteers. We now have volunteers (me, Julien Danjou and Sylvestre Ledru) and we re waiting until Julien finds some time to run the election. Misc With the help of DSA, I setup antispam rules for the owner@packages.qa.debian.org alias because I was getting tired by the amount of spam. In the process, they asked me to write a wiki page for dsa.debian.org to document everything so that they can refer to it for future queries. I did it but it looks like that they did not apply my patch yet. I also tested an upstream patch for gnome-keyring (see bugzilla #681081) that reintroduces the support of forgetting GPG passphrases after a specified amount of time. Thanks See you next month for a new summary of my activities.

No comment Liked this article? Click here. My blog is Flattr-enabled.

29 August 2012

Julien Danjou: Gnus notifications

Today, I've merged my Gnus notifications module inside Gnus git repository. This way, it will be available for everybody in Emacs 24.2. gnus-notifications example

This module allows you to be notified via notifications-notify (the Emacs implementation of the Freedesktop desktop notifications) on new messages received in Gnus. It can also retrieves contacts photo via gravatar.el and google-contacts.el to include them in the notification. To enable it in Emacs > 24.1, you just have to add the following line to your Gnus configuration file:

(add-hook 'gnus-after-getting-new-news-hook 'gnus-notifications)

If you want to download it and use it stand-alone for a previous Emacs version, you can fetch the latest file revision and load it before adding the previously given line.

11 August 2012

Julien Danjou: Sony Vaio Z Debian Linux support

I had to install Debian Wheezy on a brand new Sony Vaio Z laptop with the new Ivy Bridge architecture (SVZ1311C5E). I'll talk about this here, because it's always nice to know that new hardware works quite fine (or not) under Debian. Sony Vaio Z 2012

The laptop is delivered with Window 7, which I decided to remove entirely anyway, and replace with Debian. I've installed it with Linux 3.2 and then ran Linux 3.4, 3.5 and 3.6-rc1. USB booting Don't ask me why, nor an Ubuntu or Debian USB installation booted, blocked at SYSLINUX at best, or at a black screen. This does not work. I had to use PXE to install Debian. Storage The only thing that can be surprising, is that the 128 GB SSD storage is actually made of 2 64 GB Samsung SSD aggregated in a RAID 0 using Intel Rapid Storage Technology, previously known as Intel Matrix. This is supported by Linux using the dm-raid module. So this is a fake RAID, and you anyway can see the both drives as sda and sdb under Linux. Unfortunately, this kind of RAID is not supported correctly by GRUB, and I was unable to install it this way. Therefore, I decided to remove entirely this fake RAID (which is possible via the BIOS) and use a Linux software md RAID 0 instead, plus crypto on top of it. That I know well and I trust. :) Graphics The Intel HD Graphics 4000 works fine. I'm alsmo using the HDMI output, which works fine. There's some GPU hanging (as seen on screen and in kernel logs) in Linux up to 3.4, but with versions 3.5 and above, I didn't see any problem so far. Sound The Intel HDA sound card works pretty well, both for playing and recording. The main problem is that I hear a constant noise on the speakers, but tweaking the ALSA mixers ends it at some point. There's still probably a bug, not yet resolved in Linux 3.6-rc1. Keyboard The keyboard works fine, and the back-light too, via the sony-laptop kernel module. Wonderful. Touchpad Touchpad works fine. Fingerprint It does not work, and is not supported according to my research. Not that I care about, but don't count on it. It's an AuthenTec AES1660. Webcam It works perfectly. USB Well, USB 3.0 does not work. I had to disable XHCI in the BIOS and use the 2 ports as standard USB 2.0, otherwise I would just get errors from the kernel. Still not working with Linux 3.6-rc1, and I've no clue to debug, and do not use USB 3.0 yet, so WiFi The WiFi module (based on iwlwifi) works fine. The only problem with NetworkManager is that the sony-laptop offers a second rfkill switch and NM does not know how to handle it correctly. A bug isopened about this and I hope to be able to write a patch or something at some point. Also, there seems to be some quality issue with the iwlwifi driver and 802.11n at this point. I'm losing connection quite often when the signal drops below 40 %. Loading the module with 11n_disable=1 helps a lot. Ethernet The gigabit Realtek Ethernet controller works perfectly. Card reader Works perfectly.

27 July 2012

Julien Danjou: Ceilometer, the OpenStack metering project

For the last months, I've been working on a metering project for OpenStack, so it's time to talk a bit about it. OpenStack is a growing cloud platform providing IaaS. A problem easily identified by everyone building a public cloud platform is that nothing is provided to retrieve the platform usage data. Some data are available in some places, but not everything is, and you have to do a lot of processing from the various components to get something useful in the end. But in order to bill customers that are using your public cloud platform, you need to dot his. In this regard, a lot of companies running public OpenStack based infrastructure wrote their own solution to cover this functional areas, and to become able to bill theirs customers. To avoid everybody doing and maintaining such a stack in their corners, the Ceilometer has been created. The project aims to cover the metering aspect of the OpenStack components, pulling usage data from every components and storing them into a single place. It then offer a retrieving point for this data via a REST API. The initial specifications have been written in April this year, and actual implementation started in May. The project is currently worked on by me, Dreamhost and Canonical. We already have designed an architecture that we are implementing, and we hope to release a first usable version with Folsom. Ceilometer architecture

I did a presentation of this project yesterday at XLCloud, which has been very well received. <iframe allowfullscreen="true" frameborder="0" height="389" mozallowfullscreen="true" src="https://docs.google.com/presentation/embed?id=11ALGC4xuWcRvXKTSnnsteJUkArsTfQW-7IAfWRRI5kQ&start=false&loop=false&delayms=3000" webkitallowfullscreen="true" width="480"></iframe> If you are interested in helping us and contributing, feel free to join us during one of our weekly IRC meeting or fix some bugs. :-)

24 July 2012

Julien Danjou: Emacs configuration published

I've finally published my Emacs configuration. This took me a while, since I had personal information inside (like passwords). Recently, I've been able to move them away and can now publish everything in my Git repository. It's probably not yet usable from scratch, since I didn't include the bootstrap code for el-get. But you can at least lurk and grab some ideas or lines of code. And do not hesitate to ask me anything about it! Note that I'm using Emacs development version (trunk), so it's possible that some things do not work with (old) released Emacs versions.

21 July 2012

Julien Danjou: ERC notifications

Today, I've merged my erc notifications module inside Emacs trunk. This way, it will be available for everybody in Emacs 24.2. erc-notifications example

This module allows you to be notified via notifications-notify (the Emacs implementation of the Freedesktop desktop notifications) on private message received on IRC, or when your nickname is mentioned on a channel. To enable it in Emacs > 24.1, you just have to add the following line to your configuration file:

(add-to-list 'erc-modules 'notifications)

If you want to download it and use it stand-alone for a previous Emacs version, you can fetch the latest file revision and load it before adding the previously given line.

9 July 2012

Julien Danjou: Logitech K750 keyboard and Unifying Receiver Linux support

A year ago, I bought a Logitech Wireless Solar Keyboard K750. I'm particularly picky on keyboards, but this one is good. It has an incredible useful feature: while being wireless, it has no need for disposable or rechargeable batteries, it uses solar power!

My problem is that there's obviously no way to know the battery status from Linux, the provided application only working on Windows. And one dark night, while fragging on QuakeLive, my keyboard stopped working: it had no battery left. This activity being quite energy consuming, it emptied the whole battery. Someone should write code to get the battery status and light meter from Linux: challenge accepted! How the keyboard works Logitech Unifying Receiver

This keyboard, like many of the new wireless devices from Logitech, uses the Unifying interface. It's an USB receiver that can be attached up to 6 differents devices (mouse, keyboards ). On old Linux kernel, the Unifying receiver is recognized as only one keyboard and/or one mouse device. Recently, a driver called hid-logitech-dj has been added to the Linux kernel. With this driver, each device attached to the receiver is recognized as one different device. What the Logitech application does Logitech Solar App

The Logitech application under Windows works that way: you launch it, and it displays the battery charge level. On the keyboard, there's a special "light" button (up right). When pressed, a LED will light up on the keyboard: green if the keyboard is receiving enough light and is charging, red if the keyboard does not receive enough light and is therefore discharging. Pushing this same button while the application is running will makes the light meter activated: the application will tell you how much lux your keyboard is receiving. Let's reverse engineer this As far as I know, there's nothing in the USB HID protocol that handles this kind of functionality (battery status, light meter ) in a standard way. So the first task to accomplish is, unfortunately, to reverse engineer the program. I discovered a bit too late that Drew Fisher did a good presentation on USB reverse engineering at 28c3. You might want to take a look at it if you want to reverse engineer on USB. I did not need it, but I learned a few things. Anyway, my plan was the following: run the Logitech application inside a virtual machine running Windows, give it direct access to the USB keyboard, and sniff what happens on the USB wire. To achieve that, you need a virtual machine emulator that can do USB pass-through. Both KVM and VirtualBox can do that, but VirtualBox works much better with USB and allow hot(un)plugging of devices, so I used it. To sniff what happens on the USB, you need to load the usbmon Linux kernel module. Simply doing modprobe usbmon will work. You can then use Wireshark which know how to use usbmon devices and understand the USB protocol. USB stuff you need to know You don't need to know much about USB to understand what I'll write about below, but for the sake of comprehensibility I'll write a couple of things here before jumping in. To communicate with an USB device, we communicate with one of its endpoints. Endpoints are regrouped into an interface. Interfaces are regrouped into a configuration. A device might contains one or several configurations. There's also several types of packets in the USB wire protocol, and at least two of them interest us there, they are:

Interrupt packets, a packet send spontaneously;
Controls packets, used for command and status operations.

All of this and more is well (and better) explained in the chapter 13 of Linux Device Drivers, Third Edition. Sniffed data Once everything was set-up, I ran my beloved Wireshark. There's a an URB of type interrupt sent each time you press any key with some data in it. Therefore I advise you to plug another keyboard (or use the laptop keyboard if you're doing this on a laptop), otherwise you'll get crazy trying to sniff the keyboard you're typing on. At this point, just launching the application does a bunch of USB traffic. Pressing the "light" button on the keyboard makes even more USB packets coming in and out. Here's the interesting packets that I noticed once I excluded the noise:

When pressing the "light" button, an URB of type interrupt is sent by the keyboard to the computer;
An URB control packet is sent by the computer to the keyboard in response;
Regularly URB interrupt packets are sent just after.

With all this, the next step was clear: understand the packets and reproduce that exchange under Linux. What the packets mean The "go for the light meter" packet The packet sent from the computer to the keyboard is the following. <figure>

Frame 17: 71 bytes on wire (568 bits), 71 bytes captured (568 bits)
    Frame Length: 71 bytes (568 bits)
    Capture Length: 71 bytes (568 bits)
USB URB
    URB id: 0xffff880161997240
    URB type: URB_SUBMIT ('S')
    URB transfer type: URB_CONTROL (0x02)
    Endpoint: 0x00, Direction: OUT
        0... .... = Direction: OUT (0)
        .000 0000 = Endpoint value: 0
    Device: 6
    URB bus id: 1
    Device setup request: relevant (0)
    Data: present (0)
    URB sec: 1340124450
    URB usec: 495643
    URB status: Operation now in progress (-EINPROGRESS) (-115)
    URB length [bytes]: 7
    Data length [bytes]: 7
    [Response in: 18]
    [bInterfaceClass: HID (0x03)]
    URB setup
        bmRequestType: 0x21
            0... .... = Direction: Host-to-device
            .01. .... = Type: Class (0x01)
            ...0 0001 = Recipient: Interface (0x01)
    bRequest: SET_REPORT (0x09)
    wValue: 0x0210
        ReportID: 16
        ReportType: Output (2)
    wIndex: 2
    wLength: 7
0000  40 72 99 61 01 88 ff ff 53 02 00 06 01 00 00 00   @r.a....S.......
0010  22 ad e0 4f 00 00 00 00 1b 90 07 00 8d ff ff ff   "..O............
0020  07 00 00 00 07 00 00 00 21 09 10 02 02 00 07 00   ........!.......
0030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
0040  10 01 09 03 78 01 00                              ....x..

</figure> What's here interesting is the last part representing the data. wLength says that the length of the data is 7 bytes, so let's take a look at those 7 bytes: 10 01 09 03 78 01 00. Well, actually, you can't decode them like that, unless you're a freak or a Logitech engineer. And I have actually no idea what they mean. But sending this to the keyboard will trigger an interesting thing: the keyboard will start sending URB interrupt with some data without you pressing any more key. The "light meter and battery values" packet This is most interesting packet. This is the one sent by the keyboard to the host and that contains the data we want to retrieve. <figure>

Frame 1467: 84 bytes on wire (672 bits), 84 bytes captured (672 bits)
    Frame Length: 84 bytes (672 bits)
    Capture Length: 84 bytes (672 bits)
USB URB
    URB id: 0xffff88010c43c380
    URB type: URB_COMPLETE ('C')
    URB transfer type: URB_INTERRUPT (0x01)
    Endpoint: 0x83, Direction: IN
        1... .... = Direction: IN (1)
        .000 0011 = Endpoint value: 3
    Device: 2
    URB bus id: 6
    Device setup request: not relevant ('-')
    Data: present (0)
    URB sec: 1334953309
    URB usec: 728740
    URB status: Success (0)
    URB length [bytes]: 20
    Data length [bytes]: 20
    [Request in: 1466]
    [Time from request: 0.992374000 seconds]
    [bInterfaceClass: Unknown (0xffff)]
Leftover Capture Data: 1102091039000c061d474f4f4400000000000000
0000  80 c3 43 0c 01 88 ff ff 43 01 83 02 06 00 2d 00   ..C.....C.....-.
0010  5d c5 91 4f 00 00 00 00 a4 1e 0b 00 00 00 00 00   ]..O............
0020  14 00 00 00 14 00 00 00 00 00 00 00 00 00 00 00   ................
0030  02 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00   ................
0040  11 02 09 10 39 00 0c 06 1d 47 4f 4f 44 00 00 00   ....9....GOOD...
0050  00 00 00 00                                       ....

</figure> This packets come in regularly (1 per second) on the wire for some time once you sent the "go for the light meter" packet. At one point they are emitted less often and do not contain the value for the light meter anymore, suggesting that the control packet sent earlier triggers the activation of the light meter for a defined period. Now you probably wonder where the data are in this. They're in the 20 bytes leftover in the capture data part, indicated by Wireshark, at the end of the packet: 11 02 09 10 39 00 0c 06 1d 47 4f 4f 44 00 00 00 00 00 00 00. Fortunately, it was easy to decode. Knowing we're looking for 2 values (battery charge and light meter), we just need to observe and compare the packet emitted on the wire with the values displayed by the Logitech Solar App. To achieve this, I looked both at the Logitech Solar App and Wireshark while bringing more and more light near the keyboard, increasing the lux value received by the meter on the Solar App, and saw that the fields represented in blue (see below) where changing in Wireshark. Since 2 bytes were changing, I guessed that it was coded on 16 bits, and therefore it was easy to correlate the value with the Solar App. <figure>

[ ....9....GOOD....... ]
11 02 09 10 39 00 0c 06 1d 47 4f 4f 44 00 00 00 00 00 00 00
4 bytes - 1 byte for battery charge - 2 bytes for light meter - 2 bytes - 4 bytes for GOOD - 7 bytes

</figure> In this example, the battery has a charge of 0x39 = 57 % and the light meter receives 0x0c = 12 lux of light. It's basically dark, and that makes sense: it was night and the light was off in my office, the only light being the one coming from my screen. I've no idea what the GOOD part of the packet is about, but it's present in every packet and it's actually very handy to recognize such a packet. Therefore I'm considering this as some sort of useful mark for now. For the other bytes, they were always the same (

0x11 0x2 0x9
0x10

at the beginning, 7 times 0x00 at the end). The 2 bytes between the light meter and GOOD probably mean something, but I've no idea what for now. Building our solar app Now we've enough information to build our own very basic solar application. We know how to triggers the light meter, and we know how to decode the packets. We're going to write a small application using libusb. Here's a quick example. It's not perfect and does not check for error codes, be careful. <figure>

/* Written by Julien Danjou <julien> in 2012 */
#include <linux/hid.h>
#include <libusb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
 
    libusb_context *ctx;
    libusb_init(&ctx);
    libusb_set_debug(ctx, 3);
    /* Look at the keyboard based on vendor and device id */
    libusb_device_handle *device_handle = libusb_open_device_with_vid_pid(ctx, 0x046d, 0xc52b);
    fprintf(stderr, "Found keyboard 0x%p\n", device_handle);
    libusb_device *device = libusb_get_device(device_handle);
    struct libusb_device_descriptor desc;
    libusb_get_device_descriptor(device, &desc);
    for(uint8_t config_index = 0; config_index < desc.bNumConfigurations; config_index++)
     
        struct libusb_config_descriptor *config;
        libusb_get_config_descriptor(device, config_index, &config);
        /* We know we want interface 2 */
        int iface_index = 2;
        const struct libusb_interface *iface = &config->interface[iface_index];
        for (int altsetting_index = 0; altsetting_index < iface->num_altsetting; altsetting_index++)
         
            const struct libusb_interface_descriptor *iface_desc = &iface->altsetting[al>tsetting_index];
            if (iface_desc->bInterfaceClass == LIBUSB_CLASS_HID)
             
                libusb_detach_kernel_driver(device_handle, iface_index);
                libusb_claim_interface(device_handle, iface_index);
                unsigned char ret[65535];
                unsigned char payload[] = "\x10\x02\x09\x03\x78\x01\x00";
                if(libusb_control_transfer(device_handle,
                                           LIBUSB_REQUEST_TYPE_CLASS   LIBUSB_RECIPIENT_INTERFACE,
                                           HID_REQ_SET_REPORT,
                                           0x0210, iface_index, payload, sizeof(payload) - 1, 10000))
                 
                    int actual_length = 0;
                    while(actual_length != 20   strncmp((const char *) &ret[9], "GOOD", 4))
                        libusb_interrupt_transfer(device_handle,
                                                  iface_desc->endpoint[0].bEndpointAddress,
                                                  ret, sizeof(ret), &actual_length, 100000);
                    uint16_t lux = ret[5] << 8   ret[6];
                    fprintf(stderr, "Charge: %d %%\nLight: %d lux\n", ret[4], lux);
                 
                libusb_release_interface(device_handle, iface_index);
                libusb_attach_kernel_driver(device_handle, iface_index);
             
         
     
    libusb_close(device_handle);
    libusb_exit(ctx);

</figure> What the program is doing is the following:

Request for the Unifying Receiver device based on vendor and product ID
Get the HID interface
Detach the HID interface from the kernel driver
Claim the interface
Send a control packets, were parameters are defined using the same data we captured earlier
Read interrupt packets coming in until we receive one we recognize (length 20 containing the "GOOD" string)
Decode the content (battery charge & light meter)
Release the interface
Reattach the kernel driver to the interface

This gives the following:

Found keyboard 0x0x24ec8e0
Charge: 64 %
Light: 21 lux

Challenge accomplished! To be continued Unfortunately, this approach has at least one major drawback. We have to disconnect the Logitech Unifying Receiver from the kernel. That means that while we're waiting for the packet, we're dropping packets corresponding to other events from every connected device (key presses, pointer motions ). In order to solve that, I sent a request for help on the linux-input mailing list. That way, I learned that Logitech is actually using the HID++ protocol to communicate with the devices using the Unifying Receiver. Lars-Dominik Braun managed to get the HID++ specifications from Logitech and published them with their authorization. This opens a whole new world. With that document, I may be able to understand the part I reverse engineered and convert this to a more useful and generic library using the hidraw interface (so we don't have to disconnect the devices from the kernel driver).

2 July 2012

Julien Danjou: Making the jump: working freelance

For the last 10 years, I've been working on many Free Software projects. From Debian to OpenStack, through awesome, Emacs, XCB and many more. This obviously allowed me to enhance my technical skills, but it also taught me about Free Software and Open Source development processes, and how to work with and close to the community. Working for almost 6 years at Easter-eggs taught me how to work in an autonomous manner, how to lead and manage a project. And how to run a company, thanks to the cooperative status of this great one. These are the reasons why I decided to leave my latest job and run my own company to work as a freelance consultant & developer specialized in Free Software, starting today. Therefore, I am now able and available to provide expertise and development on Free Software, including upstream contribution. Especially on projects I already worked on recently, like OpenStack.

29 June 2012

Julien Danjou: How to make Twitter's Bootstrap tabs bookmarkable

I've been using Twitter's bootstrap library recently to build this Web site, and wondered how to be able to use the bootstrap-tab Javascript plugin in a bookmark friendly manner. I ended up with a simple solution. These are my first steps in Javascript and front-end manipulation, and it's really not my area of expertise, so don't be harsh. <figure>

function bootstrap_tab_bookmark (selector)   if (selector == undefined)  
    selector = "";  
    /* Automagically jump on good tab based on anchor */
    $(document).ready(function()  
        url = document.location.href.split('#');
        if(url[1] != undefined)  
            $(selector + '[href=#'+url[1]+']').tab('show');
         
     );
    var update_location = function (event)  
        document.location.hash = this.getAttribute("href");
     
    /* Update hash based on tab */
    $(selector + "[data-toggle=pill]").click(update_location);
    $(selector + "[data-toggle=tab]").click(update_location);

</figure> All you need is to use and call this function with a selector (only useful if you have several tabs/pills divisions) when the document is ready. The first part takes care of showing the good tab based on the hash contained in the URL. The second part takes care of changing the document location to add the current tab to it when the user clicks.

23 April 2012

Julien Danjou: OpenStack Swift eventual consistency analysis & bottlenecks

Swift is the software behind the OpenStack Object Storage service. This service provides a simple storage service for applications using RESTful interfaces, providing maximum data availability and storage capacity. I explain here how some parts of the storage and replication in Swift works, and show some of its current limitations. If you don't know Swift and want to read a more "shallow" overview first, you can read John Dickinson's Swift Tech Overview. How Swift storage works If we refer to the CAP theorem, Swift chose availability and partition tolerance and dropped consistency. That means that you'll always get your data, they will be dispersed on many places, but you could get an old version of them (or no data at all) in some odd cases (like some server overload or failure). This compromise is made to allow maximum availability and scalability of the storage platform. But there are mechanisms built into Swift to minimize the potential data inconsistency window: they are responsible for data replication and consistency. The official Swift documentation explains the internal storage in a certain way, but I'm going to write my own explanation here about this. Consistent hashing Swift uses the principle of consistent hashing. It builds what it calls a ring. A ring represents the space of all possible computed hash values divided in equivalent parts. Each part of this space is called a partition. The following schema (stolen from the Riak project) shows the principle nicely: Consistent hashing ring

In a simple world, if you wanted to store some objects and distribute them on 4 nodes, you would split your hash space in 4. You would have 4 partitions, and computing hash(object) modulo 4 would tell you where to store your object: on node 0, 1, 2 or 3. But since you want to be able to extend your storage cluster to more nodes without breaking the whole hash mapping and moving everything around, you need to build a lot more partitions. Let's say we're going to build 2¹⁰ partitions. Since we have 4 nodes, each node will have 2¹⁰ 4 = 256 partitions. If we ever want to add a 5^th node, it's easy: we just have to re-balance the partitions and move 1 4 of the partitions from each node to this 5^th node. That means all our nodes will end up with 2¹⁰ 5 204 partitions. We can also define a weight for each node, in order for some nodes to get more partitions than others. With 2¹⁰ partitions, we can have up to 2¹⁰ nodes in our cluster. Yeepee. For reference, Gregory Holt, one of the Swift authors, also wrote an explanation post about the ring. Concretely, when building one Swift ring, you'll have to say how much partitions you want, and this is what this value is really about. Data duplication Now, to assure availability and partitioning (as seen in the CAP theorem) we also want to store replicas of our objects. By default, Swift stores 3 copies of every objects, but that's configurable. In that case, we need to store each partition defined above not only on 1 node, but on 2 others. So Swift adds another concept: zones. A zone is an isolated space that does not depends on other zone, so in case of an outage on a zone, the other zones are still available. Concretely, a zone is likely to be a disk, a server, or a whole cabinet, depending on the size of your cluster. It's up to you to chose anyway. Consequently, each partitions has not to be mapped to 1 host only anymore, but to N hosts. Each node will therefore store this number of partitions:

number of partition stored on one node = number of replicas   total number of partitions   number of node

Examples:

We split the ring in 2¹⁰ = 1024 partitions. We have 3 nodes. We want 3 replicas of data.
Each node will store a copy of the full partition space: 3 2¹⁰ 3 = 2¹⁰ = 1024 partitions.

We split the ring in 2¹¹ = 2048 partitions. We have 5 nodes. We want 3 replicas of data.
Each node will store 2¹¹ 3 5 1129 partitions.

We split the ring in 2¹¹ = 2048 partitions. We have 6 nodes. We want 3 replicas of data.
Each node will store 2¹¹ 3 6 = 1024 partitions.

Three rings to rule them all In Swift, there is 3 categories of thing to store: account, container and objects. An account is what you'd expect it to be, a user account. An account contains containers (the equivalent of Amazon S3's buckets). Each container can contains user-defined key and values (just like a hash table or a dictionary): values are what Swift call objects. Swift wants you to build 3 different and independent rings to store its 3 kind of things (accounts, containers and objects). Internally, the two first categories are stored as SQLite databases, whereas the last one is stored using regular files. Note that this 3 rings can be stored and managed on 3 completely different set of servers. Swift storage schema

Data replication Now that we have our storage theory in place (accounts, containers and objects distributed into partitions, themselves stored into multiple zones), let's go the replication practice. When you put something in one of the 3 rings (being an account, a container or an object) it is uploaded into all the zones responsible for the ring partition the object belongs to. This upload into the different zones is the responsibility of the swift-proxy daemon. Swift proxy schema

But if one of the zone is failing, you can't upload all your copies in all zones at the upload time. So you need a mechanism to be sure the failing zone will catch up to a correct state at some point. That's the role of the swift- container,account,object -replicator processes. This processes are running on each node part of a zone and replicates their contents to nodes of the other zones. When they run, they walk through all the contents from all the partitions on the whole file system and for each partition, issue a special REPLICATE HTTP request to all the other zones responsible for that same partition. The other zone responds with information about the local state of the partition. That allows the replicator process to decide if the remote zone has an up-to-date version of the partition. In case of account and containers, it doesn't check at the partition level, but check each account/container contained inside each partition. If something is not up-to-date, it will be pushed using rsync by the replicator process. This is why you'll read that the replication updates are "push based" in Swift documentation.

# Pseudo code describing replication process for accounts
# The principle is exactly the same for containers
for account in accounts:
    # Determine the partition used to store this account
    partition = hash(account) % number_of_partitions
    # The number of zone is the number of replicas configured
    for zone in partition.get_zones_storing_this_partition():
        # Send a HTTP REPLICATE command to the remote swift-account-server process
        version_of_account = zone.send_HTTP_REPLICATE_for(account):
        if version_of_account < account.version()
            account.sync_to(zone)

This replication process is O(number of account number of replicas). The more your number of account will increase and the more you will want replicas for your data, the more the replication time for your accounts will grow. The same rule applies for containers.

# Pseudo code describing replication process for objects
for partition in partitions_storing_objects:
    # The number of zone is the number of replicas configured
    for zone in partition.get_zones_storing_this_partition():
        # Send a HTTP REPLICATE command to the remote swift-object-server process
        verion_of_partition = zone.send_HTTP_REPLICATE_for(partition):
        if version_of_partition < partition.version()
            # Use rsync to synchronize the whole partition
            # and all its objects
            partition.rsync_to(zone)

This replication process is O(number of objects partitions number of replicas). The more your number of objects partitions will increase, and the more you will want replicas for your data, the more the replication time for your objects will grow. I think this is something important to know when deciding how to build your Swift architecture. Choose the right number the number of replicas, partitions and nodes. Replication process bottlenecks Copycat

File accesses The problem, as you might have guessed, is that to replicate, it walks through every damn things, things being accounts, containers, or object's partition hash files. This means it need to open and read (part of) a every file your node stores to check that data need or not to be replicated! For accounts & containers replication, this is done every 30 seconds by default, but it will likely take more than 30 seconds as soon as you hit around 12 000 containers on a node (see measurements below). Therefore you'll end up checking consistency of accounts & containers on each all node all the time, using obviously a lot of CPU time. For reference, Alex Yang also did an analysis of that same problem. TCP connections Worst, the HTTP connections used to send the REPLICATE commands are not pooled: a new TCP connection is established each time something has to be checked against the same thing stored on a remote zone. This is why you'll see in the Swift's Deployment Guide this lines listed under "general system tuning":

# disable TIME_WAIT.. wait..
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_tw_reuse=1
# double amount of allowed conntrack
net.ipv4.netfilter.ip_conntrack_max = 262144

In my humble opinion, this is more an ugly hack than a tuning. If you don't activate this and if you have a lot of containers on your node, you'll end up soon with thousands of connections in TIME_WAIT state, and you indeed risk to overload the IP conntrack module. Container deletion We also should talk about container deletion. When a user deletes a container from its account, the container is marked as deleted. And that's it. It's not deleted. Therefore the SQLite database file representing the container will continue to be checked for synchronization, over and over. The only way to have a container permanently deleted is to mark an account as deleted. This way the swift-account-reaper will delete all its containers and, finally, the account. Measurement On a pretty big server, I measured the replications to be done at a speed of around 350 account,container,object-partitions /second, which can be a real problem if you chose to build a lots of partition and you have a low number_of_node number_of_replicas ratio. For example, the default parameters runs the container replication every 30 seconds. To check replication status of 12 000 containers stored on one node at the speed of 350 containers/seconds, you'll need around 34 seconds to do so. In the end, you'll never stop checking replication of your containers, and the more you'll have containers, the more your inconsistency window will increase. Conclusion Until some of the code is fixed (the HTTP connection pooling probably being the "easiest" one), I warmly recommend to chose correctly the different Swift parameters for your setup. The replication process optimization consists in having the minimum amount of partitions per node, which can be done by:

decreasing the number of partitions
decreasing the number of replicas
increasing the number of node

For very large setups, some code to speed up accounts and containers synchronization, and remove deleted containers will be required, but this does not exist yet, as far as I know.

17 April 2012

Julien Danjou: First release of PyMuninCli

Today I release a Python client library to query Munin servers. I wrote it as part of some experiments I did a few weeks ago. I discovered there was no client library to query a Munin server. There's PyMunin or python-munin which help developing Munin plugins, but nothing to access the munin-node and retrieve its data. So I decided to write a quick and simple one, and it's released under the name of PyMuninCli, providing the munin.client Python module.

Next.

Previous.